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Abstract - Correlation and similarity measures are widely used in all the areas of sciences and 
social sciences. Often the variables are not numbers but are instead qualitative descriptors called 
categorical data. We define and study similarity matrix, as a measure of similarity, for the case 
of categorical data. This is of interest due to a deluge of categorical data, such as movie ratings, 
top-10 rankings and data from social media, in the public domain that require analysis. We show 
that the statistical properties of the spectra of similarity matrices, constructed from categorical 
data, follow those from random matrix theory. We demonstrate this approach by applying it to 
the data of Indian general elections and sea level pressures in North Atlantic ocean. 


Introduction. — Study of correlations is an integral 
part of almost every branch of science and social sci¬ 
ence. Correlated systems and phenomena, such as in non¬ 
equilibrium systems, present a rich variety of behaviour 
not normally seen in uncorrelated systems. Global climate 
patterns depend on the spatial and temporal correlations 
among atmospheric variables [1] , correlations in the stock 
market records indicate clustering of stocks and indices 
PH5], correlations among EEC channels might indicate 
health of the subject mm- In computer science, corre¬ 
lations are an integral part of most clustering algorithms 
[8]. In these examples, the object of central interest is the 
same-time correlation function {x(t)y(t)) for two station¬ 
ary stochastic processes x(t) and y{t) with zero mean. The 
processes Xt and yt could be measured data or generated 
through simulations. When N variables Xi{t),i = 1,2, ...N 
are present, the correlation matrix C is the appropriate 
generalisation in which any matrix element represents 
the correlation between the variables Xi (t) and Xj (t) [9] . It 
must be noted that singular value decomposition, empir¬ 
ical orthogonal functions, Karhunen-Loeve decomposition 
are all variants of this correlation matrix approach. 

Random matrix theory (RMT) [TU] has emerged as an 
important tool to understand the spectra of correlation 
matrices [7]. It is by now well established that the spec¬ 
tra of empirical correlation matrix, for most part, is well 
described by random matrix results [2HZ11IIHI3]. Devi¬ 
ations from random matrix behaviour indicate the pres¬ 
ence of significant information [4] that cannot be explained 
purely by assumptions of randomness in matrix elements. 


All these methods and analysis, based on correlations and 
RMT, depend on the variables Xi{t) being a series of num¬ 
bers, representing some possibly stochastic phenomena. 

The main objective of this paper is to analyse a mea¬ 
sure of association or similarity for multivariate data sets 
that are not numbers but discrete qualitative indicators. 
Movie ratings and top-ten rankings are some examples of 
qualitative indicators. Even more challenging cases arise 
when discrete indicators cannot be ranked in any numer¬ 
ical order. For instance, the responses in an opinion poll 
cannot be assigned any meaningful ranking order. All such 
data sets are called categorical data [14]. In the context 
of deluge of data of various kinds available in the public 
domain over the internet, it is imperative to look for meth¬ 
ods to effectively analyse categorical data. One important 
application is in the analysis of data from social media 
such as facebook posts, twitter updates, blogs etc., which 
are mostly not in numerical form. Social media analysis 
is now widely used by corporates and even governments 
to understand the public perception of their brand value, 
products and services. Hence, computing measures of as¬ 
sociation with such non-numerical data is often necessary. 
Recently, random walks and network theory have been 
used for computing such measures m- In contrast, here 
we develop a statistical technique that is analogous to cor¬ 
relation matrix formalism and apply RMT tools. 

Generally, multivariate empirical data is highly noisy 
and redundant. Thus, it is important to separate the in¬ 
formation content from noise components. To do this, we 
obtain similarity matrix S as a multivariate generalisa- 
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tion of similarity measure. We note that similarity matrix 
S is widely applied in clustering algorithms in computer 
sciences [ 8 ] and for classifying genetic data [16]. By com- 
pariirg the statistical properties of spectra of S with that 
from an appropriate ensemble of RMT we can identify the 
eigenvalues and the eigenmodes that are random. The 
spectral components that deviate from RMT results are 
not random and generally contain system-specific iirforma- 
tion yielding valuable information about the system. We 
apply the formalism to two real-world systems, (i) analy¬ 
sis of Indian general elections results, (ii) mean sea level 
pressure over North Atlantic region. 

Formalism. — In this section, we introduce the for¬ 
malism for a similarity measure and its multivariate gen¬ 
eralisation. We consider time series x* of categorical data. 
The elements of the time series are chosen from p possible 
objects denoted by numbers I — p. Note that the labels 
I — p do not affect the value of the measure. For example, 
xt could be the time series of parties winning elections in 
a city. If there were only two parties (objects) denoted by 
1 and 2 that have won election in that city, then the time 
series could take the form, xt = 1, 2,1,1, 2,1, 2, 2,1.... For 
the case of two time series Xt and yt of length T, we define 
the similarity measure as 

T 

Cxy = (1) 

t = l 

where normalisation constant is J\f = 
the Kronecker delta {6a,b = I if a = 6 , 0 if a 7 ^ 6 ). In this, 
Wi are the weights assigned to each data point. In most 
applications, every data point is given equal weightage and 
hence Wt = 1, for all t = 1,2,3...T. Clearly, Cxy = I only 
if Xt = yty t. If Cxy = 0, this implies Xt ^ yt,'^ t. If 
0 < Cxy < 1 , it indicates that Xt and yt are dissimilar to 
varying extents. Note that Cxy is similar to Jaccard index 
[IIl[IH| used to measure similarity of finite sample sets. 

Next, we consider a multivariate scenario with N vari¬ 
ables Xi,i = 1,2, ....N, each being a time series of length 
T. This can be elegantly handled in matrix notation. Let 
D represent a data matrix with of T rows and N columns. 
Each column is a time series. We define a new operator 
through its action on two vectors a = (oi 02 ... ar) 
and b = (&i 62 ■ • ■ &t), defined as 

a ^ b 6 a i,bi 6a2 ,b2 “f . . . , &r * (^) 

This is similar to applying element-wise AND logical op¬ 
eration between the two vectors. Using this operator, the 
multivariate generalisation of similarity measure is 

S = + D (3) 

In this form, S has a structure similar to that of Wishart 
matrix C = D in multivariate statistics |19j . In partic¬ 
ular, S is also a positive definite matrix with eigenvalues 
A > 0. To study the spectra of S, we numerically solve the 
eigenvalue equation Sx^ = A^x.^ and obtain its eigenvalues 
Xi and the eigenvectors x,;. 



Fig. 1: (Colour online) Numerically computed Amax (circles) 
as a function of number of variables N (left) and number of 
objects p (right). (a,b) are for uniform distribution and (c,d) 
are for geometric distribution of random numbers. The solid 
lines are the analytical results in Eqs. l^and fTTI 

Similarity matrix. — This formalism can be illus¬ 
trated with a simple solvable model. Consider p discrete 
objects, labelled I to p, and N random variables. Each 
variable Xi{t),i = 1,2,...A^ is a time series with elements 
drawn from a discrete probability distribution P{4>) for 
(j) = l,2,3...p and P{(j>) = 0 otherwise. Then, the ele¬ 
ments of S will be 

t=l 0' = ! 0=1 

Note that turns out to be some function of p. 
Clearly, by construction, the diagonal elements are sa = 
(1/T) Pi(^) = I. In the limit T —>• 00 , Sy would 

have converged and we get S to be a matrix of order N 
and of the form 


/I 

a 

a ...\ 

a 

1 

a ... 

a 

a 

1 ... 

V- 


^ 1/ 


The off-diagonal elements are Sij = a = a{p). The eigen¬ 
values of S can be analytically obtained. There are only 
two distinct eigenvalues 

Xmax = 1 + {N - I)a, and Ai = 1 - a. (6) 

This simple estimate shows that Xmax is the dominant 
eigenvalue and the other eigenvalue is N —1 fold degener¬ 
ate. We also note that the normalised eigenvector corre¬ 
sponding to Xmax is 

(l/VfV)(I 1 1.1 1). (7) 

Now, we can apply this formalism to the case in which 
the time series Xi{t) are drawn from a discrete uniform 
distribution of the form 

P{'P) = -, (/> = l,2...p. (8) 
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Then, using Eq. 21 the elements of similarity matrix is 
Sij = a = 1/p for all i ^ j. Then, the eigenvalues are 

Amax = 1 + {N - l)/p and Ai = 1 - 1/p. (9) 

Next, we consider geometric distribution given by, 

= (1 - </>= 1,2,3..., (10) 

where 0 < g < 1. Note that unlike the uniform distribu¬ 
tion (Eq. [U, the geometric distribution has infinite sup¬ 
port. Hence we choose q such that (j) = 1,2,3.. .p such 
that 1 — < 10“"^. Then, a « 9/(2 — q). Us¬ 

ing an empirical relation we obtained q ~ 6 /p*^ ®, we get 
Sij = a « 3/(p° ® — 3). Then, the eigenvalues are 

— 1) 3 

Amax ~ 1 -I- g _ g and Ai Ri 1 - g _ g. (11) 

We will use these results as benchmarks to compare with 
the spectra computed from random similarity matrix. 

Random similarity matrix. — In this section, we 
will study the spectra of random similarity matrix Sr, in 
detail. In particular, we compare the statistical properties 
of S with those obtained from the random similarity ma¬ 
trix. We define random similarity matrix Sr for the case 
of p objects (labelled by integers I to p) and N variables 
as follows. Let Dr be a matrix whose elements are inde¬ 
pendent and identically distributed integers in the range 
[1 — p] drawn from a discrete probability distribution func¬ 
tion. Then, Sr = Dr^ * Dr is the random similarity 
matrix of order N. 

First, we look at how the number of objects p and num¬ 
ber of variables N affect the spectrum of Sr. We consider 
p = 40 and p = 400 objects with N = 1000 variables 
and length of time series being T = 2000. All the simula¬ 
tion results (solid circles in Fig l(a-d)) have been averaged 
over 100 realisations of appropriate similarity matrix. Fig. 
[T](a,b) shows the variation of Amax as a function of num¬ 
ber of variables N and number of objects p for the case of 
uniform distribution. Surprisingly, the value of Amax pre¬ 
dicted by Eq. [9l shown as solid line in this figure, holds 
good even when the elements of Sr are noisy due to finite 
length of time series. In Fig [IDc,d) shows Amax for the 
case of geometric distribution. In this case, the number 
of objects p is approximate and yet the semi-analytical 
estimate for the dominant eigenvalue (Eq. ITT]) is in good 
agreement with the simulated results. In general, Amax de¬ 
creases with p because as the number of objects increases, 
the probability that two time series will have some com¬ 
mon objects decays. For finite number of objects, this 
decay can be approximated as p~^ for uniform and p“° ® 
for geometric distribution of random numbers. In the limit 
p —>■ oo, there is only one distinct eigenvalue A = 1 and it 
is iV-fold degenerate. 

Eigenvalue Density. We study two quantities that 
characterise the eigenvalues of Sr, namely, eigenvalue den- 



Fig. 2: (Colour online) Eigenvalue density for p = 40 and p = 
400. The histograms are from simulations and the solid curves 
represent Eq. 1131 The inset shows part of the main graph to 
focus on the dominant eigenvalues (indicated by arrows) which 
are far from the bulk of eigenvalues. 

sity and spacing distribution. The mean density of eigen¬ 
values is defined by 

p(A)=^^(A-A), (12) 

i 

where 6{.) is the Dirac-delta function. Given that Eq. 
[3] has a structure similar to that of Wishart matrix, it is 
reasonable to expect the density of eigenvalues for random 
Wishart matrix C = D'^D to hold good for random sim¬ 
ilarity matrix as well. In Wishart case, if Dr is a T x iV 
random matrix with uncorrelated column vectors drawn 
from a Gaussian distribution with mean p and variance 
a^, then p(A), in the limit N ^ oo and T —>■ oo and 
(5 = -^ > 1 is the Marchenko-Pastur law m 

p(A) = Q V(^max-yA-A^in) ^ 

2707^ A 

for A € [Amin, Amax] and p(A) = 0 otherwise. In this, the 
largest and smallest eigenvalues are 

Amax/min = A+/_ =u2(1 + 1/Q±2v^). (14) 

In the limit Q = 1, the eigenvalue density leads to the 
well-known Wigner semi-circle law [2D] . We compare Eq. 
|T3|with eigenvalue density computed for random similarity 
matrix. 

The eigenvalue density, for the bulk of eigenvalues, of 
random similarity matrix Sr is shown in Fig. [2] and it is 
well described by Eq. |T3l On the other hand, the largest 
eigenvalue Amax, highlighted in the inset of Fig. jj] is an 
order of magnitude larger than all the other eigenvalues. 
It stands out from the bulk. This is a unique spectral 
signature of random similarity matrix Sr. A matrix such 
as Sr that encodes random correlations, in the spirit of 
random matrix theory, is not expected to accord special 
treatment for any part of the spectrum. Yet, the domi¬ 
nant eigenvalue Xmax has a special place in the spectrum. 
The p(A) for Poisson, Binomial and Geometric distribu¬ 
tion of random numbers shown in Fig. [3] also display a 
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Fig. 3: (Colour online) Eigenvalue density for Sr with random 
variables from discrete distributions, Poisson, Binomial and 
Geometric. Histograms are from simulations and solid lines 
are fitted using Eq. [TS] Inset shows p(A) focussing on the 
dominant eigenvalues (highlighted by arrows). 

similar feature for Xmax- In this case too (Fig. [3]) the 
bulk of eigenvalues are reasonably consistent with Eq. [131 
The mild deviation for geometric distribution case in this 
hgure can be attributed to the approximate nature of the 
calculation due to its infinite support. 

Spacing distribution. In this section, we present re¬ 
sults for the spacing distribution of the eigenvalues. We 
remove the dominant eigenvalue Xmax and compute spac¬ 
ing distribution using all the other eigenvalues in the bulk 
(see Figs. [5]|3]). If the eigenvalues of Sr are represented 
by Xi,i = 1, 2,... iV, we transform the eigenvalues to ob¬ 
tain ’unfolded’ eigenvalues €i,i = 1,2,... N, The nearest 
neighbour spacings are defined as Si = e^+i — Ci such that 
(s) = 1. Given that Sr is real symmetric matrix with ran¬ 
dom entries, we expect the empirical spacing distribution 
obtained from the spectra of Sr to be best described by 
Gaussian Orthogonal Ensemble (GOE) result, the Wigner 
distribution, of random matrix theory [10) . Hence, the 
appropriate result is, 

Pw{s) = . (15) 

In Fig. [4) we show the computed spacing distribution for 
the eigenvalues of Sr with the matrix elements of Dr 
drawn from discrete uniform and geometric distributions. 
For both these cases, the spacing distributions follow the 
random matrix theory results in Eq. 1151 It must be re¬ 
called that similar results hold good for the spacing distri¬ 
bution of empirical correlation matrices mu- We further 
note that as T —>■ oo, the matrix elements of Sr converge 
to their true values and the spacing distribution deviates 
strongly from Eq. [T5] 

Eigenvector statistics and Information Entropy. In 
this section, we study the properties of eigenvectors , i = 
l,2,...iV of Sr. The eigenvectors corresponding to the 
eigenvalues in the bulk are Gaussian distributed (not 
shown here), in accordance with the random matrix re¬ 
sults |10j . A comprehensive comparison with random ma¬ 
trix results can be done by computing the information 
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Fig. 4: (Colour online) Empirical spacing distribution for the 
eigenvalues obtained from S with elements of D drawn from (a) 
uniform distribution and (b) geometric distribution. In both 
cases, p = 40, N = 1000 and T = 2000. The solid curve is 
Wigner distribution (Eq. 1151) . 

entropy for the i-th eigenvector dehned by jH] 

Hi = ln|a;y| . (16) 

3 

The corresponding random matrix average for the infor¬ 
mation entropy is given by ^ \n{N/2) |^, where 

N is the dimension of the random matrix. We show the in¬ 
formation entropy Hi as a function of eigenvalue index i in 
Fig. [5| The information entropy Hi for the bulk of eigen¬ 
vectors follow random matrix result Hi k, In 500 = 6.214, 
indicated as a blue line. As an instance of such an eigen¬ 
vector in the bulk, we show in Fig[5Kc) the 999th eigenvec¬ 
tor components igggj . The random nature of this eigen¬ 
vector is clearly visible in its oscillations about zero. This 
behaviour must be contrasted with the dominant eigenvec¬ 
tor (corresponding to Xmax) a^iooo.j shown in FigE^b). In 
this case, though the oscillations exist, they are not about 
zero, i.e, all the components of this eigenvector have iden¬ 
tical phase. This behaviour can be understood based on 
the fact that for the simple model in Eq. [5) obtained in 
the T —>■ oo limit, the dominant eigenvector has the form 
shown in Eq. [7] Note that phases of all the components 
are identical in Eq. [7| as well. For the dominant eigen¬ 
vector of Sr the amplitudes become random but not the 
phases. This non-random phases leads to significant devi¬ 
ation from random matrix average for information 

entropy as indicated by the arrow in Fig. [5Ka). Thus, 
deviations from random matrix results imply presence of 
correlations either in the amplitude or the phase of the 
eigenvectors. This, in turn, could be traced to the corre¬ 
lations in the similarity measure for many variables. 

Application. — We apply the formalism to two differ¬ 
ent data sets, (i) the data of Indian general elections and 
(ii) atmospheric pressure in the region of North Atlantic 
ocean. We describe the motivations for choice of these 
data sets and their details below. 

Elections data. The general elections held in India to 
elect the lower house of Indian parliament is the largest 
democratic exercise of its kind in the world with about 
814.5 million people eligible to exercise their right to vote. 
These elections elect 543 representatives from as many 
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Fig. 5: (Colour online) (a) Information entropy for the eigen¬ 
vectors of Sr for the case of uniform distribution (black circle) 
and geometric distribution (red square). Hi for the dominant 
eigenvector stands out from the bulk and is highlighted by an 
arrow. For the case of Sr obtained for the uniform distribution 
case (b) shows eigenvector of the dominant eigenvalue and (c) 
shows eigenvector for an eigenvalue in the bulk. 

constituencies to the lower house, Lok Sabha. For the pur¬ 
poses of our analysis, we identify 19 major political parties 
that have had significant representation in the elections 
held in India since 1984. These parties form our objects, 
i.e, p = 19. For each constituency, the data we employ is 
a time series of winning party at seven general elections 
held during 1984-2004 and hence T = 7. The number of 
variables is the number of constituencies, N = 543. The 
general elections data, dating back to the first one in 1952, 
is provided by the Election Commission of India [53] and 
all the analysis reported here is based on this data. In 
this scenario, the similarity measure is an index of how 
close are i-th and j-th constituencies in terms of the par¬ 
ties they have elected in the series of general elections. For 
instance, Sij = 1 implies that i-th and j-th constituencies 
have exactly chosen the same set of parties in all the gen¬ 
eral elections. 

We note that the length of the time series is small and 
hence the computed matrix S is singular. This is also 
evident from the fact that out of 543 eigenvalues, only 
91 of them are non-zero which form the basis for the re¬ 
sults of eigenvalue statistics presented in Fig. |SKa,b). In 
Fig. Ei; a), we show the computed eigenvalue density p(A). 
We note that unlike in the cases shown in Figs. [2] - [S] 
many eigenvalues, both at the lower and upper end, devi¬ 
ate from random matrix formula (Eq. 1131) . Even though 
the spacing distribution, shown in Fig. |6](b), largely fol¬ 
lows Wigner distribution there are visible deviations as 
well. This could be attributed to poor statistics and to 
the fact that election data is strongly correlated as well. 
This is further corroborated by the deviations in Hi from 
random matrix results (shown as red line in Fig. |6l(c)). 

Atmospheric pressure data. Now, we consider the sea 
level atmospheric pressure (SLP) data over the North At¬ 
lantic region. This region and in particular this set of data 
has been well studied in order to understand a pressure 


Fig. 6: (Colour online) Results from the spectra of S obtained 
from data on Indian general elections, (a) Eigenvalue density 
for elections data compared against random similarity matrix, 
(b) spacing distribution obtained from elections data and (c) 
information entropy. See text for details. 

see-saw phenomena called the North Atlantic oscillations. 
In contrast with the elections data in which the political 
parties (objects) are discrete entities, in this case the SLP 
values (objects) are continuous. Notice that the formalism 
requires the objects to be discrete. Hence, we discretise 
the data as follows. If Omax and amin represent the maxi¬ 
mum and minimum observed value in the data, we create 
data intervals of width 

_ 0>max 0,min ( 17 ) 

P 

Suppose a data value a lies in, say, 3rd interval {amin + 
3A,amm + 4A), then the discretised data corresponding 
to a is 3. In signal processing, this technique of mapping 
continuous data to a countable set of integers is called 
quantization [53]. By this process, the entire data set is 
converted into time series of integers (representing data 
intervals). Since the observed data in any measurement 
is known to be contaminated by errors and instrumental 
noise, it is only fitting that intervals of observed values are 
analysed instead of the actual values. 

We use the NCEP reanalysis data of monthly mean sea 
level pressure at 434 grid points on the sea surface for the 
period 1948-2001 [5S]. A correlation matrix analysis of 
this data from the point of view of random matrix theory 
was reported in Ref. m- The data has N = 434 variables 
and each variable has time series length of T = 624. The 
number of objects (data intervals) is p = 40. The simi¬ 
larity index Sij in this case measures if the variations of 
sea level pressure at i-th and j-th geographical locations 
are similar. If Sy = 1, then the discretised data values at 
these two locations are identical. 

Using this discretised data, we compute matrix S and 
its spectra. For comparison purposes, we also compute the 
spectra of its random matrix equivalent Sr. Similar to the 
case of elections data, the eigenvalue density shown in Fig 
[3(a) (as histogram) displays deviations from that of its 
random matrix (shown as red curve) at the lower and up¬ 
per end. These deviating eigenvalues indicate correlations 
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example which lends itself readily to this analysis but in 
the latter example the original data is quantized before 
analysis. Thus, analysis described in this paper can be 
performed on most of empirically available data sets. 

* * * 

NCEP Reanalysis data provided by the 
NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, 
from their web site www.esrl.noaa.gov/psd. The data of 
Indian general elections is available from eci.nic.in. 


Fig. 7: (Colour online) Statistics of spectra of S obtained from 
data of sea level pressure, (a) Eigenvalue density for atmo¬ 
spheric pressure data compared against that from random sim¬ 
ilarity matrix, (b) spacing distribution (histogram) obtained 
from data and Wigner distribution (solid line), (c) information 
entropy from data (solid circles) and its RMT average (red 
line). See text for details. 

or system specific information that cannot be modelled by 
randomness assumptions. In this case too, the spacing dis¬ 
tribution shown in Fig. [T^b) agrees with the Wigner dis¬ 
tribution Pwis). The eigenvectors, corresponding to the 
deviating eigenvalues in Fig[7Ka), also display pronounced 
deviation from random matrix averages. This is seen in 
Fig. EKc) which shows the information entropy as a func¬ 
tion of index of eigenvalue. The dominant eigenvectors 
at the top end of the spectrum are known to capture the 
pressure patterns that are relevant in atmospheric sciences 
m- We also point out that the components of dominant 
eigenvector of S, for both elections data and SLP data, 
have identical phases (not shown here), in agreement with 
the result shown in Fig. [SKb). 

Summary. — We have studied the problem of com¬ 
puting a measure of similarity for multivariate time se¬ 
ries of categorical data, i.e., time series data sets that are 
not in numerical form. Such data sets are encountered in 
many situations in social media, say, as response to major 
events or speeches, in the context of stars or recommen¬ 
dations given to movies or books or other such resources. 
We construct a similarity matrix S by assembling together 
the similarity measure Sij between z-th and j-th variables. 
Further, we study the spectra of S for the case of uncorre¬ 
lated categorical data and compare with appropriate ran¬ 
dom matrix results. For most part, the spectra of S follow 
random matrix theory prescriptions though the dominant 
eigenvector deviates due to phase coherence. The eigenval¬ 
ues and eigenvectors of S that deviate from random matrix 
results are seen to signify the presence of correlations that 
cannot be explained by randomness assumptions that un¬ 
derlie random matrix theory. As an application of this 
approach, we use the data on the Indian general elections 
and atmospheric pressure in the North Atlantic ocean re¬ 
gion to study the similarity properties. The former is an 
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